Separate chat templates into a single file #33957

Rocketknight1 · 2024-10-04T17:46:25Z

We have several issues with chat templates because they're stored as single lines in the JSON config files:

Impossible to review diffs
Very hard to edit in the web UI (or in general)
Differences between processor templates in chat_template.json and tokenizer templates in tokenizer_config.json causing confusion
Some models use multiple templates, requiring a template dict, but we're trying to discourage that in future and move those models to single templates with conditional behaviour instead

The solution:

Just move chat templates to a single chat_template.jinja file in the repo
If multiple templates are required, then they should still be stored in the JSON file. This is not supported for Processor classes, so processors should always be able to save their template as a raw Jinja file. In general, we'll be gently deprecating multiple templates in future.
If a chat_template.jinja file is present, it overrides the JSON files. If a tokenizer is loaded with both Jinja and JSON chat templates and resaved, it should save only the Jinja file, and not have any chat_template entry in tokenizer_config.json.

For now, we continue saving in the old format by default. I'll probably keep it this way for several versions before making the new format the default, to ensure that most users are able to load the new format before it becomes common. Until then, the new format should mostly be used for testing, to make sure it's ready for deployment when we do the switch.

Extremely draft PR for now, since it'll probably break lots of things!

TODO:

Add loading/saving single files in tokenizers
Add loading/saving single files in processors
Add tokenizer tests for saving + loading + push_to_hub working correctly
Add processor tests for saving + loading + push_to_hub working correctly
Ensure processors + tokenizers don't clash when both saving chat templates

HuggingFaceDocBuilderDev · 2024-10-04T18:11:22Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

ArthurZucker · 2024-10-04T20:14:40Z

Interesting! I think we would need to be extra careful as this can influence quite a lot of other libs, to which we need to open PRs before merging this one!

FYI @LysandreJik and @Narsil

LysandreJik · 2024-10-09T10:17:24Z

Also cc @zucchini-nlp

CISC · 2024-10-09T20:35:29Z

Oh, wow, this will affect a lot of other projects. Also keep in mind that the HF API exposes the chat templates through the config.tokenizer_config entry in ModelInfo.

I will release a chat template editor (Gradio space) soon that lets you easily view, modify, test and submit PRs (just to address a couple of OP bullet-points), so thanks to @Rocketknight1 for pointing me towards this PR, I will need to keep a close eye on changes. :)

Rocketknight1 · 2024-10-10T13:33:33Z

@CISC yes, I'm aware that this could be disruptive! When this PR is ready we'll discuss it internally and with the community, and maybe we might end up aborting it at that point (and I'll try not to let that delay the inverse templates PR too much either way)

Rocketknight1 · 2024-10-10T15:02:52Z

This is ready for review! There's one major question: Right now I'm saving processor_chat_template.jinja and chat_template.jinja as separate files to ensure no conflicts, even though most repos should only have one.

We could consider always using chat_template.jinja for both, however in that case we would need to override the loading behaviour on the tokenizer, if it's the child of a processor, because if the processor saves a chat template file then the tokenizer will also load that and use it to set its own chat template attribute as well, which might have strange side effects.

Other than that, this should be ready for review! (cc some possibly affected people @zucchini-nlp @Narsil @xenova @LysandreJik)

Failing tests in UDOP seem to be unrelated and I can't reproduce them locally

zucchini-nlp

Seems good to me, but for processors now we'll have three ways of saving and loading templates. Some people (including repos maintained by us) still have it the old way, as part of processor_config.json. Introducing one more template file might be confusing for users, so we have to maybe add this somewhere in the docs. I mean when we finally decide to roll out saving/loading from chat_template.jinja files by default

Rocketknight1 · 2024-10-14T15:35:37Z

@zucchini-nlp - agreed, but hopefully we can deprecate the others! I hope when we're finally finished, we just have chat_template.jinja and processor_chat_template.jinja. I was considering merging both into chat_template.jinja but that creates lots of weird side effects, and I decided against it.

Rocketknight1 · 2024-10-16T15:27:31Z

Update: This should be ready for final review now! The failing UDOP tests are not related to this PR, and I have another PR open to fix them here #34180

To be clear, we will not move the chat templates when this PR is merged. This PR just adds support for saving/loading to the new location, but we will not save to the new file locations by default. This will give the many other frameworks that load chat templates time to adapt, and give users time to update to versions that support the new locations.

cc affected people: @Narsil @xenova @Wauplin

Wauplin · 2024-10-16T15:48:00Z

Thanks for the ping @Rocketknight1! I'll check what it implies server-side to make the parsing of the chat template available in the API.

Wauplin · 2024-10-16T16:32:07Z

High level feedback: to avoid any confusion in the future, shouldn't the long-term files be called tokenizer_chat_template.jinja and processor_chat_template.jinja?
Otherwise some model repos will end up with chat_template.json/processor_chat_template.jinja for processors and tokenizer_config.json/chat_template.jinja for tokenizers (for backward compatibility). Meaning that chat_completion.jinja and chat_completion.json won't have the same meaning.

Having a longer and more explicit final name would avoid confusion in my opinion.

Rocketknight1 · 2024-10-16T16:37:26Z

I think that makes sense, yes! It'll be a bit longer but very explicit about which files affect which classes. I'll make that change if other reviewers agree as well.

Wauplin · 2024-10-18T16:15:28Z

I was considering merging both into chat_template.jinja but that creates lots of weird side effects, and I decided against it.

Hi @Rocketknight1 @zucchini-nlp Getting back to this topic as I'd like to understand more the reasoning behind this file structure. I have a few questions:

What are the side effects in play if we set a single jinja file for both tokenizers and processors?
Apart from the text-only vs text+image(+<any modality?>) difference, is there another fundamental difference between a tokenizer and a processor? Can it happen that a VLM has 1 template for text-only and 1 template for text+image?
(maybe a stupid question) if we have extra information to store about a jinja template, could it make sense to define a header structure that could be added to jinja templates (as jinja comments) with high level metadata? typically to indicate if the template is suitable for tokenizers and/or processors (an alternative could be to use a ~~yaml~~ toml config file instead of pure jinja)
what are the main libraries/tools that have to be updated when we update this standard? At least transformers/transformers.js/TGI for sure. Any other one to think about?

Without knowing all the technical details, I'd advocate for a truly single file chat template if possible. If that's the case, we would support things differently server-side, typically having a single model_info.config.chat_template field instead of model_info.config.tokenizer_config.chat_template and model_info.config.processor_config.chat_template. Discussed it further with @julien-c who advocated to support only the (expected) final version on the Hub instead of the intermediate chat_template.json/... alternatives, to nudge model authors to use the "standard" asap (meaning reverting the PR I merged yesterday on moon-landing).

And sorry if all of this has already been discussed elsewhere!

julien-c · 2024-10-18T16:17:46Z

Discussed it further with @julien-c who advocated to support only the (expected) final version on the Hub instead of the intermediate chat_template.json/... alternatives, to nudge model authors to use the "standard" asap

100%

Rocketknight1 · 2024-10-18T16:19:44Z

Hi @Wauplin, 1. is a good question! The main reason I wanted separate templates is because I thought the side effects would be unpredictable. For example, many processors have processor.chat_template but tokenizer.chat_template is None. However, if we put all chat templates in chat_template.json, then when the Tokenizer is loaded for that model, it will also set processor.tokenizer.chat_template. This is probably safe, but it might break some stuff that I can't predict!

CISC · 2024-10-23T15:18:03Z

I've published the initial version of Chat Template Editor, ~~however the full functionality is currently blocked by the gr.LoginButton not working~~.

I've started work related to this and the inverse template PR, but it's currently disabled/hidden until finalized.

I have a small question I hope someone can provide an answer to however; this PR mentions processor chat templates, and I've tried locating repos containing them to see how they work and possibly add support for them, but I can't seem to find any?

Found community Pixtral which was very helpful, added support for chat_template.json and multimodal messages.

ArthurZucker

Let's go with a single file systeme and define a good standard for it. If the chat template stays as it is today, it does not have to be "in" the tokenizer: it's a new form of pre-processing that comes before the tokenzier and as such we don't need different file.

let's synch about what's the best format to use, the simpler the better

Rocketknight1 · 2024-11-05T19:51:06Z

I got waylaid by a couple of urgent tasks, but this should finally be ready! The summary is:

The only raw chat template file is chat_template.jinja, for both processors and tokenizers. This means that processors and tokenizers saved in the same repo cannot have different chat templates.
For now, we continue saving in the old format by default, but we support loading the new format. We keep it like that for a couple of versions so that when we start saving the new format it doesn't break everyone's experience.
When a tokenizer has multiple templates, it cannot be saved in the new format. This probably won't be changed - we want to gently deprecate multiple templates.

ArthurZucker

LGTM but the PR description needs to be updated!

Rocketknight1 · 2024-11-26T14:18:00Z

PR description updated, and merging!

Wauplin · 2024-11-27T10:39:26Z

Thanks @Rocketknight1 ! Is there a first repo example on the Hub to check that? Asking since we'll have to support this new (and definitive!) file in the server-side parsing

Rocketknight1 · 2024-11-27T13:44:41Z

cc @Wauplin not yet, but I'll make one in hf-internal-testing today!

* Initial draft * Add .jinja file loading for processors * Add processor saving of naked chat template files * make fixup * Add save-load test for tokenizers * Add save-load test for tokenizers * stash commit * Try popping the file * make fixup * Pop the arg correctly * Pop the arg correctly * Add processor test * Fix processor code * stash commit * Processor clobbers child tokenizer's chat template * Processor clobbers child tokenizer's chat template * make fixup * Split processor/tokenizer files to avoid interactions * fix test * Expand processor tests * Rename arg to "save_raw_chat_template" across all classes * Update processor warning * Move templates to single file * Move templates to single file * Improve testing for processor/tokenizer clashes * Improve testing for processor/tokenizer clashes * Extend saving test * Test file priority correctly * make fixup * Don't pop the chat template file before the slow tokenizer gets a look * Remove breakpoint * make fixup * Fix error

Rocketknight1 force-pushed the chat_template_files branch 2 times, most recently from b4cc131 to 0216160 Compare October 8, 2024 17:48

Rocketknight1 mentioned this pull request Oct 9, 2024

Add inverse chat templating #33321

Open

9 tasks

Rocketknight1 force-pushed the chat_template_files branch from 2b7c7ca to 3488357 Compare October 10, 2024 13:47

Rocketknight1 marked this pull request as ready for review October 10, 2024 15:02

zucchini-nlp reviewed Oct 11, 2024

View reviewed changes

Rocketknight1 force-pushed the chat_template_files branch 2 times, most recently from aef89c6 to 0c164ab Compare October 15, 2024 18:06

LysandreJik requested a review from ArthurZucker October 18, 2024 11:29

ArthurZucker reviewed Oct 29, 2024

View reviewed changes

Rocketknight1 force-pushed the chat_template_files branch 2 times, most recently from ef6a33c to a6ff538 Compare November 5, 2024 18:40

Rocketknight1 requested review from ArthurZucker and LysandreJik November 15, 2024 14:57

Rocketknight1 added 21 commits November 15, 2024 14:58

Fix processor code

743a4a5

stash commit

455a297

Processor clobbers child tokenizer's chat template

52614f3

Processor clobbers child tokenizer's chat template

a200a42

make fixup

5c792bc

Split processor/tokenizer files to avoid interactions

5305a87

fix test

3656468

Expand processor tests

e412619

Rename arg to "save_raw_chat_template" across all classes

bd81682

Update processor warning

50d374a

Move templates to single file

a81a7e1

Move templates to single file

f9127c1

Improve testing for processor/tokenizer clashes

3878828

Improve testing for processor/tokenizer clashes

74cd295

Extend saving test

cf577e1

Test file priority correctly

b97e76d

make fixup

e5b8b76

Don't pop the chat template file before the slow tokenizer gets a look

5b40f04

Remove breakpoint

2159cdc

make fixup

5b88768

Fix error

1187047

Rocketknight1 force-pushed the chat_template_files branch from a6ff538 to 1187047 Compare November 15, 2024 14:58

ArthurZucker approved these changes Nov 22, 2024

View reviewed changes

Rocketknight1 merged commit d5cf91b into main Nov 26, 2024
27 checks passed

Rocketknight1 deleted the chat_template_files branch November 26, 2024 14:18

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Separate chat templates into a single file #33957

Separate chat templates into a single file #33957

Rocketknight1 commented Oct 4, 2024 •

edited

Loading

HuggingFaceDocBuilderDev commented Oct 4, 2024

ArthurZucker commented Oct 4, 2024 •

edited

Loading

LysandreJik commented Oct 9, 2024

CISC commented Oct 9, 2024

Rocketknight1 commented Oct 10, 2024

Rocketknight1 commented Oct 10, 2024

zucchini-nlp left a comment

Rocketknight1 commented Oct 14, 2024

Rocketknight1 commented Oct 16, 2024

Wauplin commented Oct 16, 2024

Wauplin commented Oct 16, 2024 •

edited

Loading

Rocketknight1 commented Oct 16, 2024

Wauplin commented Oct 18, 2024

julien-c commented Oct 18, 2024

Rocketknight1 commented Oct 18, 2024 •

edited

Loading

CISC commented Oct 23, 2024 •

edited

Loading

ArthurZucker left a comment

Rocketknight1 commented Nov 5, 2024

ArthurZucker left a comment

Rocketknight1 commented Nov 26, 2024

Wauplin commented Nov 27, 2024

Rocketknight1 commented Nov 27, 2024

Separate chat templates into a single file #33957

Separate chat templates into a single file #33957

Conversation

Rocketknight1 commented Oct 4, 2024 • edited Loading

HuggingFaceDocBuilderDev commented Oct 4, 2024

ArthurZucker commented Oct 4, 2024 • edited Loading

LysandreJik commented Oct 9, 2024

CISC commented Oct 9, 2024

Rocketknight1 commented Oct 10, 2024

Rocketknight1 commented Oct 10, 2024

zucchini-nlp left a comment

Choose a reason for hiding this comment

Rocketknight1 commented Oct 14, 2024

Rocketknight1 commented Oct 16, 2024

Wauplin commented Oct 16, 2024

Wauplin commented Oct 16, 2024 • edited Loading

Rocketknight1 commented Oct 16, 2024

Wauplin commented Oct 18, 2024

julien-c commented Oct 18, 2024

Rocketknight1 commented Oct 18, 2024 • edited Loading

CISC commented Oct 23, 2024 • edited Loading

ArthurZucker left a comment

Choose a reason for hiding this comment

Rocketknight1 commented Nov 5, 2024

ArthurZucker left a comment

Choose a reason for hiding this comment

Rocketknight1 commented Nov 26, 2024

Wauplin commented Nov 27, 2024

Rocketknight1 commented Nov 27, 2024

Rocketknight1 commented Oct 4, 2024 •

edited

Loading

ArthurZucker commented Oct 4, 2024 •

edited

Loading

Wauplin commented Oct 16, 2024 •

edited

Loading

Rocketknight1 commented Oct 18, 2024 •

edited

Loading

CISC commented Oct 23, 2024 •

edited

Loading